This document is the summary of the Introduction to R workshop.
All correspondence related to this document should be addressed to:
Omid Ghasemi (Macquarie University, Sydney, NSW, 2109, AUSTRALIA)
Email: omidreza.ghasemi@hdr.mq.edu.auThe aim of the study is to test if simple arguments are more effective in belief revision than more complex arguments. To that end, we present participants with an imaginary scenario (two alien creatures on a planet) and a theory (one creature is predator and the other one is prey) and we ask them to rate the likelihood truth of the theory based on a simple fact (We adapted this method from Gregg et al.,2017; see the original study here). Then, in a between-subject manipulation, participants will be presented with either 6 simple arguments (Modus Ponens conditionals) or 6 more complex arguments (Modus Tollens conditionals), and they will be asked to rate the likelihood truth of the initial theory on 7 stages.
The first stage is the base rating stage. The next three stages include supportive arguments of the theory and the last three arguments include disproving arguments of the theory. We hypothesized that the group with simple arguments shows better persuasion (as it reflects in higher ratings for the supportive arguments) and better dissuasion (as it reflects in lower ratings for the opposing arguments).
In the last part of the study, participants will be asked to answer several cognitive capacity/style measures including thinking style (CRT), open-mindedness (AOT-E), reasoning ability (mindware), and numeracy scales. We hypothesized that cognitive ability, cognitive style, and open-mindedness are positive predictors of persuasion and dissuasion. These associations should be more pronounced for participants in the group with complex arguments because the ability and willingness to engage in deliberative thinking may favor participants to assess the underlying logical structure of those arguments. However, for participants in the simple group, the logical structure of arguments is more evident, so participants with lower ability can still assess the logical status of those arguments.
Thus, our hypotheses for this experiment are as follows:
Participants in the group with simple arguments have higher ratings for supportive arguments (They are more easily persuaded than those in the group with complex arguments).
Participants in the group with simple arguments have lower ratings for opposing arguments (They are more easily dissuaded than those in the group with complex arguments).
There are significant associations between thinking style (CRT), open-mindedness (AOT-E), reasoning ability (mindware), and numeracy scales with both persuasion and dissuasion indexes in each group and in the entire sample. The relationship between these measures should be stronger, although not significantly, for participants in the group with complex arguments.
First, we need to design the experiment. For this experiment, we use online platforms for data collection. There are several options such as Gorilla, JSpsych, Qualtrics, psychoJS (pavlovia), etc. Since we do not need any reaction time data, we simply use Qualtrics. For an overview of different lab-based and online platforms, see here.
Next, we need to decide on the number of participants (sample size). For this study, we do not sue power analysis since we cannot access more than 120 participants. However, it is highly suggested calculate sample size using power estimation. You can find some nice tutorials on how to do that here, here, and here.
After we created the experiment and decided on the sample size, the next step is to preresigter the study. However, it would be better to do a pilot with 4 or 5 participants, clean all the data, do the desired analysis, and then pre-register the analysis and those codes. You can find the preregistration form for the current study here.
Finally, we need to restructure our project in a tidy folder with different sub-folders. Having a clean and tidy folder structure can save us! There are different formats of folder structure (for example, see here and here), but for now, we use the following structure:
# load libraries
library(tidyverse)
library(here)
library(janitor)
library(broom)
library(afex)
library(emmeans)
library(knitr)
library(kableExtra)
library(ggsci)
library(patchwork)
library(skimr)
# install.packages("devtools")
# devtools::install_github("easystats/correlation")
library("correlation")
options(scipen=999) # turn off scientific notations
options(contrasts = c('contr.sum','contr.poly')) # set the contrast sum globally
options(knitr.kable.NA = '')
Artwork by Allison Horst: https://github.com/allisonhorst/stats-illustrations
R can be used as a calculator. For mathematical purposes, be careful of the order in which R executes the commands.
10 + 10
## [1] 20
4 ^ 2
## [1] 16
(250 / 500) * 100
## [1] 50
R is a bit flexible with spacing (but no spacing in the name of variables and words)
10+10
## [1] 20
10 + 10
## [1] 20
R can sometimes tell that you’re not finished yet
10 +
How to create a variable? Variable assignment using <- and =. Note that R is case sensitive for everything
pay <- 250
month = 12
pay * month
## [1] 3000
salary <- pay * month
Few points in naming variables and vectors: use short, informative words, keep same method (e.g., not using capital words, use only _ or . ).
Function is a set of statements combined together to perform a specific task. When we use a block of code repeatedly, we can convert it to a function. To write a function, first, you need to define it:
my_multiplier <- function(a,b){
result = a * b
return (result)
}
This code do nothing. To get a result, you need to call it:
my_multiplier (a=2, b=4)
## [1] 8
# or: my_multiplier (2, 4)
We can set a default value for our arguments:
my_multiplier2 <- function(a,b=4){
result = a * b
return (result)
}
my_multiplier2 (a=2)
## [1] 8
# or: my_multiplier (2)
# or: my_multiplier (2, 6)
Fortunately, you do not need to write everything from scratch. R has lots of built-in functions that you can use:
round(54.6787)
## [1] 55
round(54.5787, digits = 2)
## [1] 54.58
Use ? before the function name to get some help. For example, ?round. You will see many functions in the rest of the workshop.
function class() is used to show what is the type of a variable.
TRUE, FALSE can be abbreviated as T, F. They has to be capital, ‘true’ is not a logical data:class(TRUE)
## [1] "logical"
class(F)
## [1] "logical"
class(2)
## [1] "numeric"
class(13.46)
## [1] "numeric"
class("ha ha ha ha")
## [1] "character"
class("56.6")
## [1] "character"
class("TRUE")
## [1] "character"
Can we change the type of data in a variable? Yes, you need to use the function as.---()
as.numeric(TRUE)
## [1] 1
as.character(4)
## [1] "4"
as.numeric("4.5")
## [1] 4.5
as.numeric("Hello")
## Warning: NAs introduced by coercion
## [1] NA
Vector: when there are more than one number or letter stored. Use the combine function c() for that.
sale <- c(1, 2, 3,4, 5, 6, 7, 8, 9, 10) # also sale <- c(1:10)
sale <- c(1:10)
sale * sale
## [1] 1 4 9 16 25 36 49 64 81 100
Subsetting a vector:
days <- c("Saturday", "Sunday", "Monday", "Tuesday", "Wednesday", "Thursday", "Friday")
days[2]
## [1] "Sunday"
days[-2]
## [1] "Saturday" "Monday" "Tuesday" "Wednesday" "Thursday" "Friday"
days[c(2, 3, 4)]
## [1] "Sunday" "Monday" "Tuesday"
Create a vector named my_vector with numbers from 0 to 1000 in it:
my_vector <- (0:1000)
mean(my_vector)
## [1] 500
median(my_vector)
## [1] 500
min(my_vector)
## [1] 0
range(my_vector)
## [1] 0 1000
class(my_vector)
## [1] "integer"
sum(my_vector)
## [1] 500500
sd(my_vector)
## [1] 289.1081
List: allows you to gather a variety of objects under one name (that is, the name of the list) in an ordered way. These objects can be matrices, vectors, data frames, even other list.
my_list = list(sale, 1, 3, 4:7, "HELLO", "hello", FALSE)
my_list
## [[1]]
## [1] 1 2 3 4 5 6 7 8 9 10
##
## [[2]]
## [1] 1
##
## [[3]]
## [1] 3
##
## [[4]]
## [1] 4 5 6 7
##
## [[5]]
## [1] "HELLO"
##
## [[6]]
## [1] "hello"
##
## [[7]]
## [1] FALSE
Factor: Factors store the vector along with the distinct values of the elements in the vector as labels. The labels are always character irrespective of whether it is numeric or character. For example, variable gender with “male” and “female” entries:
gender <- c("male", "male", "male", " female", "female", "female")
gender <- factor(gender)
R now treats gender as a nominal (categorical) variable: 1=female, 2=male internally (alphabetically).
summary(gender)
## female female male
## 1 2 3
Question: why when we ran the above function i.e. summary(), it showed three and not two levels of the data? Hint: run ‘gender’.
gender
## [1] male male male female female female
## Levels: female female male
So, be careful of spaces!
Create a gender factor with 30 male and 40 females (Hint: use the rep() function):
gender <- c(rep("male",30), rep("female", 40))
gender <- factor(gender)
gender
## [1] male male male male male male male male male male
## [11] male male male male male male male male male male
## [21] male male male male male male male male male male
## [31] female female female female female female female female female female
## [41] female female female female female female female female female female
## [51] female female female female female female female female female female
## [61] female female female female female female female female female female
## Levels: female male
There are two types of categorical variables: nominal and ordinal. How to create ordered factors (when the variable is nominal and values can be ordered)? We should add two additional arguments to the factor() function: ordered = TRUE, and levels = c("level1", "level2"). For example, we have a vector that shows participants’ education level.
edu<-c(3,2,3,4,1,2,2,3,4)
education<-factor(edu, ordered = TRUE)
levels(education) <- c("Primary school","high school","College","Uni graduated")
education
## [1] College high school College Uni graduated
## [5] Primary school high school high school College
## [9] Uni graduated
## Levels: Primary school < high school < College < Uni graduated
We have a factor with patient and control values. Here, the first level is control and the second level is patient. Change the order of levels, so patient would be the first level:
health_status <- factor(c(rep('patient',5),rep('control',5)))
health_status
## [1] patient patient patient patient patient control control control
## [9] control control
## Levels: control patient
health_status_reordered <- factor(health_status, levels = c('patient','control'))
health_status_reordered
## [1] patient patient patient patient patient control control control
## [9] control control
## Levels: patient control
Finally, can you relabel both levels to uppercase characters? (Hint: check ?factor)
health_status_relabeled <- factor(health_status, levels = c('patient','control'), labels = c('Patient','Control'))
health_status_relabeled
## [1] Patient Patient Patient Patient Patient Control Control Control
## [9] Control Control
## Levels: Patient Control
Matrices: All columns in a matrix must have the same mode(numeric, character, etc.) and the same length. It can be created using a vector input to the matrix function.
my_matrix = matrix(c(1,2,3,4,5,6,7,8,9), nrow = 3, ncol = 3)
my_matrix
## [,1] [,2] [,3]
## [1,] 1 4 7
## [2,] 2 5 8
## [3,] 3 6 9
Data frames: (two-dimensional objects) can hold numeric, character or logical values. Within a column all elements have the same data type, but different columns can be of different data type. Let’s create a dataframe:
id <- 1:200
group <- c(rep("Psychotherapy", 100), rep("Medication", 100))
response <- c(rnorm(100, mean = 30, sd = 5),
rnorm(100, mean = 25, sd = 5))
my_dataframe <-data.frame(Patient = id,
Treatment = group,
Response = response)
We also could have done the below
my_dataframe <-data.frame(Patient = c(1:200),
Treatment = c(rep("Psychotherapy", 100), rep("Medication", 100)),
Response = c(rnorm(100, mean = 30, sd = 5),
rnorm(100, mean = 25, sd = 5)))
In large data sets, the function head() enables you to show the first observations of a data frames. Similarly, the function tail() prints out the last observations in your data set.
head(my_dataframe)
tail(my_dataframe)
| Patient | Treatment | Response |
|---|---|---|
| 1 | Psychotherapy | 33.95265 |
| 2 | Psychotherapy | 36.59023 |
| 3 | Psychotherapy | 31.35356 |
| 4 | Psychotherapy | 31.28723 |
| 5 | Psychotherapy | 26.07646 |
| 6 | Psychotherapy | 32.55705 |
| Patient | Treatment | Response | |
|---|---|---|---|
| 195 | 195 | Medication | 25.43623 |
| 196 | 196 | Medication | 29.84969 |
| 197 | 197 | Medication | 20.78075 |
| 198 | 198 | Medication | 24.71622 |
| 199 | 199 | Medication | 19.27959 |
| 200 | 200 | Medication | 28.21100 |
Similar to vectors and matrices, brackets [] are used to selects data from rows and columns in data.frames:
my_dataframe[35, 3]
## [1] 27.07054
How can we get all columns, but only for the first 10 participants?
my_dataframe[1:10, ]
| Patient | Treatment | Response |
|---|---|---|
| 1 | Psychotherapy | 33.95265 |
| 2 | Psychotherapy | 36.59023 |
| 3 | Psychotherapy | 31.35356 |
| 4 | Psychotherapy | 31.28723 |
| 5 | Psychotherapy | 26.07646 |
| 6 | Psychotherapy | 32.55705 |
| 7 | Psychotherapy | 28.67988 |
| 8 | Psychotherapy | 24.10665 |
| 9 | Psychotherapy | 39.00802 |
| 10 | Psychotherapy | 25.32071 |
How to get only the Response column for all participants?
my_dataframe[ , 3]
## [1] 33.95265 36.59023 31.35356 31.28723 26.07646 32.55705 28.67988
## [8] 24.10665 39.00802 25.32071 29.40034 25.34399 32.76062 24.84760
## [15] 28.19872 28.07090 33.18354 36.96603 33.57646 24.70220 27.52271
## [22] 26.67792 32.18050 31.79190 31.26808 36.97686 27.28482 21.81082
## [29] 27.65397 36.16490 28.24979 27.91537 30.50115 26.67929 27.07054
## [36] 32.97979 26.86852 30.00099 30.31407 34.00020 24.11383 27.30582
## [43] 33.42166 33.58378 28.59359 25.82842 30.22732 32.52653 25.45891
## [50] 28.48568 29.84932 31.79137 26.83967 30.06615 33.60250 32.54311
## [57] 26.72521 30.82507 28.43880 32.63311 23.47622 31.74743 28.90358
## [64] 25.45604 28.29429 27.07143 29.39342 21.46206 35.40375 28.75179
## [71] 40.10787 33.33897 21.38696 23.84687 32.67429 24.74431 27.99605
## [78] 31.65217 21.85429 27.63721 28.65842 25.78996 25.96648 36.08914
## [85] 30.07535 23.89873 22.34546 43.54409 41.41673 38.58313 39.84696
## [92] 33.26880 32.53167 36.67067 24.99626 29.00689 21.53824 34.32791
## [99] 28.35649 22.87029 26.81499 28.86468 41.85022 22.29191 19.12864
## [106] 14.99090 25.58605 27.88090 34.86319 22.19514 14.80459 25.60687
## [113] 23.95901 23.47584 31.87240 23.42361 25.67122 25.78058 20.18780
## [120] 20.76465 28.69680 22.73116 28.05487 21.02846 24.61480 13.43809
## [127] 27.53525 32.95569 25.25064 21.50898 26.96032 24.09697 20.81215
## [134] 27.54453 17.49910 24.61833 21.66894 23.55843 27.23249 23.71856
## [141] 27.43508 26.14407 17.80355 30.47530 21.45722 30.94322 26.31791
## [148] 20.19915 20.74641 20.07306 27.43122 20.07770 29.10473 18.31211
## [155] 25.30474 14.69377 21.25549 23.84503 31.58354 29.37609 31.40735
## [162] 22.75052 23.72945 22.89517 15.26411 21.29202 19.44516 24.58160
## [169] 29.24478 23.47076 32.63926 20.30651 25.76974 24.58956 25.29660
## [176] 32.78838 24.80085 27.29078 21.09988 21.69405 22.56959 26.02983
## [183] 27.50900 17.92092 17.90916 25.43834 25.16679 29.05139 26.00173
## [190] 24.11655 21.03248 26.37766 23.69654 25.15137 25.43623 29.84969
## [197] 20.78075 24.71622 19.27959 28.21100
Another easier way for selecting particular items is using their names that is more helpful than number of the rows in large data sets:
my_dataframe[ , "Response"]
# OR:
my_dataframe$Response